Skip to content

refactor: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes#223

Merged
KAJdev merged 36 commits intomainfrom
zeke/single-entrypoint
Mar 5, 2026
Merged

refactor: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes#223
KAJdev merged 36 commits intomainfrom
zeke/single-entrypoint

Conversation

@KAJdev
Copy link
Contributor

@KAJdev KAJdev commented Feb 25, 2026

Unified Endpoint API

Replaces 8 resource config classes (LiveServerless, CpuLiveServerless, LiveLoadBalancer, CpuLiveLoadBalancer, ServerlessEndpoint, CpuServerlessEndpoint, LoadBalancerSlsResource, CpuLoadBalancerSlsResource) and the @remote decorator with a single Endpoint class.

Fixes AE-2259
Fixes AE-2306

Queue-based

  @Endpoint(name="worker", gpu=GpuType.ANY, dependencies=["torch"])
  async def predict(input_data: dict) -> dict:
      ...

Load-balanced

  api = Endpoint(name="service", cpu="cpu3c-1-2", workers=(1, 3))

  @api.post("/predict")
  async def predict(data: dict) -> dict:
      ...

Client mode

  ep = Endpoint(id="ep-abc123")
  job = await ep.run({"prompt": "hello"})
  await job.wait()
  print(job.output)

What changed

  • Endpoint is a facade that internally creates the old resource config objects, so the existing deployment/provisioning/handler pipeline continues working unchanged
  • QB vs LB is inferred from usage pattern (decorator vs route registration)
  • GPU vs CPU is a parameter (gpu= / cpu=), not a class choice
  • EndpointJob wraps job responses with status(), wait(), cancel(), and property access (job.id, job.output, job.error, job.done)
  • Scanner, manifest builder, and resource discovery all recognize Endpoint patterns
  • Legacy classes and @Remote emit DeprecationWarning on import/use
  • Skeleton templates (flash init) generate the new API

@KAJdev KAJdev requested a review from deanq February 25, 2026 23:15
@KAJdev KAJdev marked this pull request as ready for review February 25, 2026 23:46
@runpod-Henrik
Copy link
Contributor

pulled this down to verify with my examples a few notes:

  1. ServerlessScalerType not exposed

04_scaling_performance/01_autoscaling/gpu_worker.py configures scaling strategies:

scale_to_zero_config = LiveServerless(
name="04_01_scale_to_zero",
gpus=[GpuGroup.ANY],
workersMin=0, workersMax=3, idleTimeout=5,
scalerType=ServerlessScalerType.QUEUE_DELAY,
scalerValue=4,
)

This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue,
REQUEST_COUNT scales based on pending request volume. The example shows three strategies side by side (scale-to-zero,
always-on, high-throughput) with different scalerType/scalerValue combos.

Endpoint() doesn't have these params, so there's no way to express this:

What we'd want:

@endpoint(name="worker", gpu=GpuGroup.ANY, workers=(0, 3),
scaler_type=ServerlessScalerType.QUEUE_DELAY, scaler_value=4)
async def scale_to_zero_inference(payload: dict) -> dict: ...

Could we add scaler_type / scaler_value (or a combined scaler= param)?


  1. PodTemplate features not surfaced
    (new example not checked in yet)
    03_advanced_workers/04_custom_images/gpu_worker.py uses PodTemplate for full Docker control:

template = PodTemplate(
name="03_04_custom_template",
imageName="runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04",
containerDiskInGb=30,
dockerArgs="--shm-size=2g",
startScript="echo 'Worker starting with custom image'",
ports="8080/http",
# containerRegistryAuthId="your-auth-id", # for private registries
)

gpu_config = ServerlessEndpoint(
name="03_04_custom_images",
gpus=[GpuGroup.ADA_24],
template=template,
workersMin=0, workersMax=2, idleTimeout=5,
)

Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size),
startScript (pre-run setup), ports, containerDiskInGb, and containerRegistryAuthId (private registries) — have no
equivalent. These are important for real-world deployments where the default Flash image doesn't work (custom CUDA
versions, private model servers, etc.).

Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?


  1. Class-based @endpoint?

Two examples use @Remote on a class for stateful workers. Here's the pattern from
05_data_workflows/01_network_volumes/gpu_worker.py:

@Remote(resource_config=gpu_config, dependencies=["diffusers", "torch", "transformers"])
class SimpleSD:
def init(self):
# Runs once at worker startup — loads 4GB model into GPU memory
self.pipe = StableDiffusionPipeline.from_pretrained(...)
self.pipe = self.pipe.to("cuda")

  async def generate_image(self, prompt: str) -> dict:
      # Uses self.pipe — already warm in GPU memory
      image = self.pipe(prompt=prompt, ...).images[0]
      return {"image_path": image_path}

The class is instantiated once when the worker boots. The model stays in GPU memory via self.pipe and every request
calls methods on the same instance — no re-loading a 4GB model per request.

With function-based @endpoint, there's no self to hold state:

@endpoint(name="worker", gpu=GpuGroup.ANY, dependencies=["diffusers", "torch"])
async def generate_image(prompt: str) -> dict:
# Re-loading a 4GB model on every request — 30+ seconds of overhead each time
pipe = StableDiffusionPipeline.from_pretrained(...)
pipe = pipe.to("cuda")
image = pipe(prompt=prompt, ...).images[0]
return {"image_path": image_path}

Does @endpoint(...) support decorating classes the same way @Remote does? If not, we'd need a workaround (module-level
global with lazy init) or keep these on the legacy API.


  1. GpuGroup vs GpuType

The PR's skeleton templates use GpuType:

@endpoint(name="gpu_worker", gpu=GpuType.ANY, dependencies=["torch"])
async def gpu_hello(input_data: dict) -> dict: ...

But existing examples all use GpuGroup:

@endpoint(name="worker", gpu=GpuGroup.ADA_24)
async def my_func(payload: dict) -> dict: ...

Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
4090), GpuGroup is a family (e.g. all Ada 24GB cards: 4090, L4, etc.). For examples, which should we standardize on?
Current thinking:

  • GpuGroup for "give me any GPU in this tier" (most examples)
  • GpuType only for the GPU selection example that targets a specific card

@KAJdev
Copy link
Contributor Author

KAJdev commented Feb 26, 2026

This controls how autoscaling decides to add workers — QUEUE_DELAY scales based on how long jobs wait in queue,
REQUEST_COUNT scales based on pending request volume. The example shows three strategies side by side (scale-to-zero,
always-on, high-throughput) with different scalerType/scalerValue combos.

Endpoint() doesn't have these params, so there's no way to express this:

will work on adding those parameters

Endpoint(image=) only takes the image name string. The other template features — dockerArgs (e.g. shared memory size),
startScript (pre-run setup), ports, containerDiskInGb, and containerRegistryAuthId (private registries) — have no
equivalent. These are important for real-world deployments where the default Flash image doesn't work (custom CUDA
versions, private model servers, etc.).

Could we either add a template= param that accepts a PodTemplate, or surface these as top-level kwargs on Endpoint?

👍

Does https://github.com/endpoint(...) support decorating classes the same way https://github.com/Remote does? If not, we'd need a workaround (module-level
global with lazy init) or keep these on the legacy API.

Endpoint does support classes

Both work — Endpoint(gpu=) accepts either. But they mean different things: GpuType is a specific GPU model (e.g. RTX
4090), GpuGroup is a family (e.g. all Ada 24GB cards: 4090, L4, etc.). For examples, which should we standardize on?

we should prefer GpuType in simpler examples, since it is easier to understand, but expand to GpuGroup for situations when more scale is important

@runpod-Henrik
Copy link
Contributor

QA Report

Status: WARN
PR: #223 — feat: single entrypoint
Agent: flash-qa (PR mode)

CI Status

All 6 Quality Gates pass (Python 3.10–3.14 + Build Package). No CI regressions detected.

Note: Unable to run local tests — worktree branch checkout was blocked by sandbox policy. All analysis below is from static diff review and CI results.

PR Scope

  • 13 source files changed/added (695-line endpoint.py is new)
  • 9 test files added with 161 test methods
  • Key changes: new Endpoint class, deprecation warnings on legacy classes/remote, scanner + discovery + manifest + provisioner updates for Endpoint patterns

Test File Summary

Test File Tests Coverage Area
test_endpoint.py ~55 Endpoint construction, init params, QB/LB decorators, resource config type matrix (2x2x2), caching
test_endpoint_client.py ~40 EndpointJob lifecycle, run/runsync/cancel, _ensure_endpoint_ready (id + image modes), LB client requests, end-to-end flows
test_deprecations.py ~20 Deprecation warnings for 8 legacy classes + remote decorator, non-deprecated names verified
test_discovery_endpoint.py ~10 ResourceDiscovery with Endpoint LB patterns, resolve, directory scan, mixed legacy+Endpoint
test_skeleton_endpoint.py 4 Skeleton templates use Endpoint API (gpu/cpu/lb workers + README)
test_scanner_endpoint.py ~20 Scanner: QB functions/classes, LB routes, all HTTP methods, mixed patterns, edge cases
test_manifest_endpoint.py ~6 Manifest building with Endpoint QB/LB metadata, deployment config extraction with unwrapping
test_run_endpoint.py ~7 flash run: scan + server generation for QB/LB/mixed Endpoint patterns
test_resource_provisioner.py +4 Endpoint resource_type resolution to correct internal classes (4 combinations)

PR Diff Analysis

  • No bare exceptions
  • No hardcoded secrets (RUNPOD_API_KEY properly popped from env dict)
  • No print() in library source (print() only in skeleton if __name__ == "__main__" blocks and README examples — acceptable)
  • Public API surface changes documented: Endpoint and EndpointJob added to __all__, TYPE_CHECKING imports updated
  • Deprecation warnings added with stacklevel=2 for correct caller attribution
  • _internal=True flag on remote() suppresses double-warnings when called from Endpoint internals
  • Resource config caching prevents redundant provisioning
  • remote() deprecation is a breaking behavioral change — all existing users importing remote will get DeprecationWarning. This is intentional but should be documented in release notes.

Observations & Issues

1. Dual-purpose methods create subtle API surface
The .get()/.post()/.put()/.delete()/.patch() methods return either a decorator (no data arg, non-client mode) or a coroutine (client mode). This is determined by self.is_client. While tested, this design could confuse users:

ep = Endpoint(name="my-api")
ep.post("/compute")          # returns a decorator
ep_client = Endpoint(id="x")
ep_client.post("/compute")   # returns a coroutine

The distinction is tested but the boundary between "no data arg = decorator" vs "data=None = client call" is not explicitly tested. A user calling ep.post("/compute", None) in decorator mode would get a coroutine instead of a decorator.

2. _is_live_provisioning() default heuristic
When FLASH_IS_LIVE_PROVISIONING is unset, the function defaults to live mode unless RUNPOD_ENDPOINT_ID or RUNPOD_POD_ID is set. This heuristic is reasonable but not tested — no test verifies the fallback behavior when the env var is missing.

3. Endpoint with name=None and routes
Endpoint(name=None, id=None) raises ValueError, but Endpoint() (no args) also triggers this. However, scanner edge case test uses Endpoint() without name= and it works because the test file calls Endpoint() which would raise at runtime. The scanner test at line that tests my_api = Endpoint() — this would raise ValueError("name or id is required") at import time during manifest extraction, though AST-only scanning avoids executing the code.

Test Quality Assessment

Strengths:

  • Full 2x2x2 resource config type matrix tested (qb/lb x gpu/cpu x live/deploy = 8 combinations)
  • Client mode end-to-end flows well covered (run, wait, cancel, timeout)
  • Edge cases: FastAPI @app.get() not falsely matched, unregistered variable routes ignored, nested directories, cross-call detection
  • Mixed legacy + Endpoint coexistence tested at scanner, discovery, and manifest levels
  • Assertion quality is good — specific field checks, not just len() assertions

Missing Coverage:

  • _is_live_provisioning() standalone tests — no test verifies the fallback heuristic when env var is unset
  • _normalize_gpu() / _normalize_cpu() error paths — invalid types (e.g., gpu="string") not tested
  • Endpoint.__call__ with invalid func — what happens if you @ep decorate a non-callable?
  • Client mode PUT/DELETE/PATCH calls — only GET and POST client calls tested in TestClientRequest; PUT, DELETE, PATCH use the same _client_request path but are not explicitly verified
  • EndpointJob.wait() backoff intervals — the exponential backoff logic (_POLL_INITIAL_INTERVAL, _POLL_BACKOFF_FACTOR, _POLL_MAX_INTERVAL) is not verified; tests only check correctness, not timing behavior
  • Thread safety of _cached_resource_config — no concurrent access test (low risk for typical usage)
  • Deprecation warning stacklevel — no test verifies the warning points to the caller's frame, not the internal frame

Suggested Improvements:

  1. Add a parametrized test for _normalize_gpu and _normalize_cpu with invalid inputs
  2. Add a test for _is_live_provisioning() with various env combinations (unset, "true", "false", RUNPOD_ENDPOINT_ID set)
  3. Consider adding a PUT/DELETE/PATCH client call test for completeness (even if trivially same path)
  4. The _mock_httpx_client helper is well-designed but could be moved to conftest for reuse

Review Comments Integration

The PR already addresses the 4 review items from @runpod-Henrik:

  1. scaler_type/scaler_value — added in commit 3787b4b (params on Endpoint + manifest extraction + provisioner support)
  2. PodTemplatetemplate= param added in same commit
  3. Class-based @endpoint — confirmed supported, tested in TestEndpointQBClass and TestScanEndpointWorkers.test_endpoint_class_discovered_as_qb
  4. GpuGroup vs GpuType — skeleton templates use GpuType.ANY, examples use GpuGroup — per author's preference

Recommendation

MERGE WITH NOTES

The PR is solid — 161 tests, CI green on all Python versions, comprehensive coverage of the new Endpoint API. The dual-purpose method design and _is_live_provisioning() heuristic warrant documentation but are not blockers. Two suggestions before merge:

  1. Add release notes documenting the remote() deprecation warning (all existing code will emit warnings)
  2. Consider adding 2-3 tests for the missing _is_live_provisioning() fallback and _normalize_gpu/_normalize_cpu error paths

Generated by flash-qa agent

@deanq deanq changed the title feat: single entrypoint feat: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes Mar 3, 2026
@deanq deanq changed the title feat: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes refactor: Endpoint class as a single entrypoint uniting @remote and ServerlessResource-based classes Mar 3, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a unified Endpoint facade as the single user-facing API for Flash endpoints, consolidating the previous @remote decorator + multiple resource config classes into one entrypoint while keeping the underlying provisioning/handler pipeline intact via internal resource-config unwrapping.

Changes:

  • Added runpod_flash.endpoint.Endpoint and EndpointJob to support QB (decorator) mode, LB (route registration) mode, and client mode (id= / image=).
  • Updated scanning/discovery/manifest generation and skeleton templates/docs to recognize and showcase the new Endpoint patterns.
  • Marked legacy @remote and legacy resource config classes as deprecated (warnings + compatibility import paths), and added extensive unit tests.

Reviewed changes

Copilot reviewed 30 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/test_skeleton_endpoint.py Verifies skeleton templates use Endpoint (not @remote / legacy classes).
tests/unit/test_endpoint_client.py Adds unit tests for client-mode calls and EndpointJob lifecycle methods.
tests/unit/test_endpoint.py Adds broad unit coverage for Endpoint construction, mode inference, and internal config selection.
tests/unit/test_discovery_endpoint.py Ensures ResourceDiscovery recognizes Endpoint LB patterns and resolves to deployable resources.
tests/unit/test_deprecations.py Ensures deprecation warnings are emitted for legacy imports/usages.
tests/unit/runtime/test_resource_provisioner.py Extends resource provisioner tests for manifest resource_type="Endpoint".
tests/unit/cli/commands/test_run_endpoint.py Tests worker scanning + server generation for Endpoint QB/LB patterns.
tests/unit/cli/commands/build_utils/test_scanner_endpoint.py Adds comprehensive scanner tests for Endpoint QB/LB patterns and edge cases.
tests/unit/cli/commands/build_utils/test_manifest_endpoint.py Tests manifest-building and deployment-config extraction for Endpoint resources.
src/runpod_flash/runtime/resource_provisioner.py Adds manifest-time mapping from Endpoint to underlying resource classes and extracts scaler fields.
src/runpod_flash/endpoint.py Implements Endpoint + EndpointJob, including internal resource-config selection and client methods.
src/runpod_flash/core/discovery.py Extends discovery to detect Endpoint LB usage (ep = Endpoint(...) + @ep.get/post/...).
src/runpod_flash/client.py Deprecates remote() (warning) and adds _internal flag for Endpoint internals.
src/runpod_flash/cli/utils/skeleton_template/lb_worker.py Updates LB skeleton template to use Endpoint route registration.
src/runpod_flash/cli/utils/skeleton_template/gpu_worker.py Updates GPU QB skeleton template to @Endpoint(...) and adds a small main test harness.
src/runpod_flash/cli/utils/skeleton_template/cpu_worker.py Updates CPU QB skeleton template to @Endpoint(...) and adds a small main test harness.
src/runpod_flash/cli/utils/skeleton_template/README.md Updates template README to document QB/LB/client usage via Endpoint.
src/runpod_flash/cli/commands/run.py Updates CLI “no workers found” guidance to show Endpoint examples.
src/runpod_flash/cli/commands/build_utils/scanner.py Adds Endpoint QB/LB AST detection and metadata emission.
src/runpod_flash/cli/commands/build_utils/manifest.py Unwraps Endpoint when extracting deployment config + adds scaler fields.
src/runpod_flash/cli/commands/_run_server_helpers.py Unwraps Endpoint in LB execution helper before provisioning.
src/runpod_flash/init.py Exposes Endpoint/EndpointJob and adds deprecation warnings for legacy names.
docs/Using_Remote_With_LoadBalancer.md Rewritten to describe LB endpoints via Endpoint rather than @remote + LB classes.
docs/Load_Balancer_Endpoints.md Updates docs to position Endpoint as user-facing API, legacy classes as internal.
docs/LoadBalancer_Runtime_Architecture.md Updates runtime docs terminology and examples to Endpoint.
docs/GPU_Provisioning.md Updates examples to use Endpoint patterns and scaler params.
docs/Flash_SDK_Reference.md Updates SDK reference to Endpoint as primary API + adds EndpointJob section.
docs/Flash_Deploy_Guide.md Updates deployment guide terminology and diagrams to Endpoint.
docs/Deployment_Architecture.md Updates architecture doc to reflect Endpoint-based scanning/manifest.
docs/Cross_Endpoint_Routing.md Updates routing doc examples to use Endpoint.
.gitignore Adds /.pi ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #223: Unified Endpoint Class

Nice refactor unifying 8 resource config classes + @remote into a single Endpoint facade. The API design is clean and the test coverage is solid (+2,688 lines of tests). Found a few issues:


Bug 1 (HIGH): _normalize_workers accepts negative and inverted values

endpoint.py_normalize_workers()

def _normalize_workers(workers):
    if workers is None:
        return (0, 1)
    if isinstance(workers, int):
        return (0, workers)  # workers=-5 → (0, -5)
    if isinstance(workers, (tuple, list)) and len(workers) == 2:
        return (int(workers[0]), int(workers[1]))  # (-3, -1) accepted
  • Endpoint(name="x", workers=-5)(0, -5) silently accepted
  • Endpoint(name="x", workers=(10, 2))min > max silently accepted
  • Endpoint(name="x", workers=(3.99, 7.01)) → silently truncated to (3, 7) by int()

Fix: Add validation:

min_w, max_w = ...
if min_w < 0 or max_w < 0:
    raise ValueError(f"workers cannot be negative: ({min_w}, {max_w})")
if min_w > max_w:
    raise ValueError(f"workers min ({min_w}) cannot exceed max ({max_w})")

Bug 2 (MEDIUM): No duplicate route detection in _route()

endpoint.py_route()

def _route(self, method: str, path: str):
    # ...validation...
    def decorator(func):
        self._routes.append({"method": method, "path": path, ...})

Routes are appended to a list with no duplicate check. Two functions registered on @api.post("/predict") silently coexist — last one wins at runtime.

Fix: Check before appending:

existing = [(r["method"], r["path"]) for r in self._routes]
if (method, path) in existing:
    raise ValueError(f"duplicate route: {method} {path}")

Bug 3 (MEDIUM): No reserved path validation

endpoint.py_route()

The docs explicitly list /execute and /ping as reserved paths, but _route() doesn't block them. @api.post("/execute") or @api.get("/ping") would collide with framework endpoints.

Fix: Add to _route():

_RESERVED_PATHS = frozenset({"/execute", "/ping"})
if path in _RESERVED_PATHS:
    raise ValueError(f"path {path} is reserved by the framework")

Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True for all Endpoint patterns

scanner.py_build_endpoint_qb_metadata() and _build_endpoint_route_metadata()

Both methods hardcode is_live_resource=True. This means flash deploy (non-live provisioning) would still see is_live_resource=True, potentially using the wrong resource class (Live* instead of deploy-time *Endpoint).

The _build_resource_config() in endpoint.py correctly calls _is_live_provisioning() at runtime, but the scanner metadata is set statically at scan time.

Fix: Either defer the flag or set it dynamically:

is_live_resource=_is_live_provisioning(),  # or leave as None, let downstream decide

Bug 5 (MEDIUM): get()/post() silently ignore data in decorator mode

endpoint.pyget()

def get(self, path: str, data: Any = None, **kwargs):
    if self.is_client:
        return self._client_request("GET", path, data, **kwargs)
    return self._route("GET", path)  # data silently dropped

In decorator mode, data and **kwargs are silently ignored. A user writing @api.get("/health", data={"key": "val"}) gets no error — the data is silently dropped.

Fix: In decorator mode, validate no extra args:

if data is not None or kwargs:
    raise TypeError(
        "data and kwargs are only valid in client mode (Endpoint with id= or image=). "
        "In decorator mode, use @api.get('/path') with no data argument."
    )

Bug 6 (LOW): EndpointJob.wait() can overshoot timeout

endpoint.pyEndpointJob.wait()

while not self.done:
    if deadline is not None and time.monotonic() >= deadline:
        raise TimeoutError(...)
    await asyncio.sleep(interval)  # sleeps THEN checks status
    await self.status()  # network call, can take arbitrary time
    interval = min(interval * _POLL_BACKOFF_FACTOR, _POLL_MAX_INTERVAL)

The deadline check happens before asyncio.sleep(interval) + the network call to status(). With _POLL_MAX_INTERVAL=5.0, the actual wait can overshoot by 5s + network latency. Not critical since it's a best-effort timeout, but worth documenting or checking deadline after the sleep too.


Bug 7 (LOW): Discovery string match "Endpoint" too broad

discovery.py — content check

if "@remote" not in content and "Endpoint" not in content:
    continue  # skip file

This matches any file containing the substring "Endpoint" anywhere — comments, docstrings, variable names like api_endpoint_url. This is a pre-filter so false positives just mean extra AST parsing (performance, not correctness), but could be tightened.


Overall this is well-structured. The main concerns are Bug 1 (negative workers will propagate to the API and either error cryptically or create broken endpoints) and Bug 2/3 (route collisions are a common user error that should be caught early).

Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — PR #223

Nice work addressing the feedback from the first review. 5 of 7 original bugs are now fixed with tests. Here's where things stand:

Previously reported — now FIXED (thank you!)

  • ✅ Bug 1: _normalize_workers negative/inverted validation (commit 2b7e838)
  • ✅ Bug 2: Duplicate route detection (commit 2b7e838)
  • ✅ Bug 3: Reserved path validation (commit 2b7e838)
  • ✅ Bug 5: get()/post() data silently dropped in decorator mode (commit 2b7e838)
  • _ClientCoroutine wrapper gives clear errors when using client endpoints as decorators (commit 7cf0cf8)
  • ✅ R2 presigned URL auth header fix in upload_build() (commit c4cf791)

Still open from original review

Bug 4 (MEDIUM): Scanner hardcodes is_live_resource=True

_build_endpoint_qb_metadata(), _build_endpoint_route_metadata(), and _register_endpoint_variable() all hardcode is_live_resource=True. During flash deploy, the scanner metadata would claim live mode even when deploying. The runtime _build_resource_config() correctly calls _is_live_provisioning(), but the scanner metadata doesn't match.

Likely low blast radius since the manifest extraction unwraps Endpoint and calls _build_resource_config() which does the right thing — but the scanner metadata is misleading.

New finding

_ensure_endpoint_ready() caches URL for image= mode regardless of lb flag

# image= mode: use cached result
if self._endpoint_url is not None:
    return self._endpoint_url  # always returns first-cached format

For id= mode this is handled correctly (resolves fresh each time). But for image= mode, if the first call is QB-style (ep.run(...) → path URL), then an LB-style call (ep.get("/path") → subdomain URL) returns the wrong format from cache.

Fix: cache both formats, or re-derive from the deployed ID:

if self._endpoint_url is not None:
    if lb:
        return self._resolve_lb_url(self._deployed_id)
    return self._resolve_qb_url(self._deployed_id)

Overall

This is in great shape — the validation, error messages, and test coverage are significantly improved since the first review. The _ClientCoroutine wrapper is a particularly nice touch for catching the decorator-on-client-endpoint mistake. The scaler/template additions round out the feature set.

The is_live_resource hardcoding and URL caching bug are the only remaining concerns — neither is a blocker but the URL caching could cause confusing behavior for image= endpoints that make both QB and LB calls.

@flash-singh0 flash-singh0 self-requested a review March 5, 2026 00:45
Copy link

@flash-singh0 flash-singh0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, bugs raised by copilot are low priority

@KAJdev KAJdev merged commit 5c3f3a6 into main Mar 5, 2026
6 checks passed
@KAJdev KAJdev deleted the zeke/single-entrypoint branch March 5, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants